Summarize by Aili

Making Peace with LLM Non-determinism

https://barryzhang.substack.com/p/making-peace-with-llm-non-determinism

🌈 Abstract

The article discusses the challenges of non-determinism in large language models (LLMs) and explores ways to address them. It covers topics such as the misconceptions around sampling, the role of seeding, the impact of hardware non-determinism, and the inherent ambiguity and high-cardinality of language itself.

🙋 Q&A

[01] Making Peace with LLM Non-determinism

1. What are the key issues with non-determinism in LLMs that the author discusses?

The author is bothered by the non-determinism in LLMs, as it makes it difficult to reliably reproduce user journeys, steer the outputs, set up effective unit tests/monitoring, or guarantee product behavior.
The author initially thought that "greedier" sampling (e.g., temperature=0, top-p=0, top-k=1) would make the outputs more deterministic, but learned that while sampling parameters influence the perceived diversity of generated text, the sampling process itself is often pseudo-random.
The author discovered that even with proper seeding, the choice of hyperparameters turns into a deterministic optimization problem, and seeding alone is not enough to ensure determinism, especially when dealing with sparse Mixture of Experts (MoE) architectures.

2. What are the main causes of hardware non-determinism identified in the article?

Hardware non-determinism often comes from intentional operation implementations that optimize for performance, as deterministic outputs across runs require consistent operation order, which can be costly in terms of synchronization and memory access.
The problem of operation orders gets worse when inference is performed by a network of different devices, due to hardware variation and device parallelization.
Other potential sources of hardware non-determinism include factors like the chip's orientation relative to the sky.

3. How does the author distinguish the challenge of non-determinism in LLMs from the inherent ambiguity and high-cardinality of language itself?

The author concludes that the main challenge in working with LLMs is not the residual non-determinism, but the inherent ambiguity and high-cardinality of language.
Language is both the input and output of LLMs, and small changes in language (e.g., punctuation, capitalization) can significantly shift the output distribution, which then propagates to the next token, model, or system.

[02] Strategies to Address Non-determinism and Language Challenges

1. What strategies does the author suggest to address the challenges of non-determinism and language in LLMs?

Make better models: Develop LLMs with lower perplexity and more steerability to address some of the issues.
Reduce unnecessary non-determinism: Control perturbations that don't contribute to the desired goal.
Reduce ambiguity:
- Pre-processing: Consolidate inputs to reduce ambiguity (e.g., fix typos, use synonyms).
- Caching: Cache outputs and route similar inputs to them, potentially with a perplexity-based "discover" page.
Reduce cardinality:
- Guided decoding: Use techniques like Guided Decoding to constrain the sampling to a subset of tokens.
- Semantic evaluation: Use semantic metrics (e.g., BERTscore) to group semantically similar outputs.
- Define clear axes for evaluation: Focus evaluations on specific axes to reduce the multi-faceted trade-offs of language outputs.

2. How does the author suggest embracing the inherent randomness and ambiguity of language in LLMs?

Natively robust UX: Language models can provide a robust user experience with clear affordance, encouraging exploration, and being fairly robust to bad inputs.
Exploration: Non-determinism in language models can be used effectively to explore ideas, especially when grounded with deterministic scaffolds like code verification.

Shared by Daniel Chen ·

Install fromChrome Web Store